75 research outputs found

    An Open source Implementation of ITU-T Recommendation P.808 with Validation

    Full text link
    The ITU-T Recommendation P.808 provides a crowdsourcing approach for conducting a subjective assessment of speech quality using the Absolute Category Rating (ACR) method. We provide an open-source implementation of the ITU-T Rec. P.808 that runs on the Amazon Mechanical Turk platform. We extended our implementation to include Degradation Category Ratings (DCR) and Comparison Category Ratings (CCR) test methods. We also significantly speed up the test process by integrating the participant qualification step into the main rating task compared to a two-stage qualification and rating solution. We provide program scripts for creating and executing the subjective test, and data cleansing and analyzing the answers to avoid operational errors. To validate the implementation, we compare the Mean Opinion Scores (MOS) collected through our implementation with MOS values from a standard laboratory experiment conducted based on the ITU-T Rec. P.800. We also evaluate the reproducibility of the result of the subjective speech quality assessment through crowdsourcing using our implementation. Finally, we quantify the impact of parts of the system designed to improve the reliability: environmental tests, gold and trapping questions, rating patterns, and a headset usage test

    Transformation of Mean Opinion Scores to Avoid Misleading of Ranked based Statistical Techniques

    Full text link
    The rank correlation coefficients and the ranked-based statistical tests (as a subset of non-parametric techniques) might be misleading when they are applied to subjectively collected opinion scores. Those techniques assume that the data is measured at least at an ordinal level and define a sequence of scores to represent a tied rank when they have precisely an equal numeric value. In this paper, we show that the definition of tied rank, as mentioned above, is not suitable for Mean Opinion Scores (MOS) and might be misleading conclusions of rank-based statistical techniques. Furthermore, we introduce a method to overcome this issue by transforming the MOS values considering their 95%95\% Confidence Intervals. The rank correlation coefficients and ranked-based statistical tests can then be safely applied to the transformed values. We also provide open-source software packages in different programming languages to utilize the application of our transformation method in the quality of experience domain.Comment: his paper has been accepted for publication in the 2020 Twelfth International Conference on Quality of Multimedia Experience (QoMEX

    Application of Just-Noticeable Difference in Quality as Environment Suitability Test for Crowdsourcing Speech Quality Assessment Task

    Full text link
    Crowdsourcing micro-task platforms facilitate subjective media quality assessment by providing access to a highly scale-able, geographically distributed and demographically diverse pool of crowd workers. Those workers participate in the experiment remotely from their own working environment, using their own hardware. In the case of speech quality assessment, preliminary work showed that environmental noise at the listener's side and the listening device (loudspeaker or headphone) significantly affect perceived quality, and consequently the reliability and validity of subjective ratings. As a consequence, ITU-T Rec. P.808 specifies requirements for the listening environment of crowd workers when assessing speech quality. In this paper, we propose a new Just Noticeable Difference of Quality (JNDQ) test as a remote screening method for assessing the suitability of the work environment for participating in speech quality assessment tasks. In a laboratory experiment, participants performed this JNDQ test with different listening devices in different listening environments, including a silent room according to ITU-T Rec. P.800 and a simulated background noise scenario. Results show a significant impact of the environment and the listening device on the JNDQ threshold. Thus, the combination of listening device and background noise needs to be screened in a crowdsourcing speech quality test. We propose a minimum threshold of our JNDQ test as an easily applicable screening method for this purpose.Comment: This paper has been accepted for publication in the 2020 Twelfth International Conference on Quality of Multimedia Experience (QoMEX

    Design optimization of switched reluctance motor for noise reduction

    Get PDF
    With finite element method (FEM) using ANSYS finite element (FE) package, an electromagnetic-structural simulation model is introduced for the switched reluctance motor (SRM). Since the main reason of noise and vibration in the SRM is a radial force applied to stator poles, the 2D FE transient analysis is carried out in  electromagnetic modeling to predict the instantaneous radial force. Based on 3D FEM, the modal analysis is done in the developed structural model to determine mode shapes and natural frequencies. Using the developed simulation model and an evolutionary algorithm, a method is proposed for design optimization of the SRM to decrease noise. To evaluate the proposed method, the simulation results are presented for an 8/6 switched reluctance motor.

    Multi-dimensional Speech Quality Assessment in Crowdsourcing

    Full text link
    Subjective speech quality assessment is the gold standard for evaluating speech enhancement processing and telecommunication systems. The commonly used standard ITU-T Rec. P.800 defines how to measure speech quality in lab environments, and ITU-T Rec.~P.808 extended it for crowdsourcing. ITU-T Rec. P.835 extends P.800 to measure the quality of speech in the presence of noise. ITU-T Rec. P.804 targets the conversation test and introduces perceptual speech quality dimensions which are measured during the listening phase of the conversation. The perceptual dimensions are noisiness, coloration, discontinuity, and loudness. We create a crowdsourcing implementation of a multi-dimensional subjective test following the scales from P.804 and extend it to include reverberation, the speech signal, and overall quality. We show the tool is both accurate and reproducible. The tool has been used in the ICASSP 2023 Speech Signal Improvement challenge and we show the utility of these speech quality dimensions in this challenge. The tool will be publicly available as open-source at https://github.com/microsoft/P.808

    VCD: A Video Conferencing Dataset for Video Compression

    Full text link
    Commonly used datasets for evaluating video codecs are all very high quality and not representative of video typically used in video conferencing scenarios. We present the Video Conferencing Dataset (VCD) for evaluating video codecs for real-time communication, the first such dataset focused on video conferencing. VCD includes a wide variety of camera qualities and spatial and temporal information. It includes both desktop and mobile scenarios and two types of video background processing. We report the compression efficiency of H.264, H.265, H.266, and AV1 in low-delay settings on VCD and compare it with the non-video conferencing datasets UVC, MLC-JVC, and HEVC. The results show the source quality and the scenarios have a significant effect on the compression efficiency of all the codecs. VCD enables the evaluation and tuning of codecs for this important scenario. The VCD is publicly available as an open-source dataset at https://github.com/microsoft/VCD

    Design optimization of switched reluctance motor for noise reduction

    Get PDF
    With finite element method (FEM) using ANSYS finite element (FE) package, an electromagnetic-structural simulation model is introduced for the switched reluctance motor (SRM). Since the main reason of noise and vibration in the SRM is a radial force applied to stator poles, the 2D FE transient analysis is carried out in  electromagnetic modeling to predict the instantaneous radial force. Based on 3D FEM, the modal analysis is done in the developed structural model to determine mode shapes and natural frequencies. Using the developed simulation model and an evolutionary algorithm, a method is proposed for design optimization of the SRM to decrease noise. To evaluate the proposed method, the simulation results are presented for an 8/6 switched reluctance motor.

    Towards speech quality assessment using a crowdsourcing approach: evaluation of standardized methods

    Get PDF
    Subjective speech quality assessment has traditionally been carried out in laboratory environments under controlled conditions. With the advent of crowdsourcing platforms tasks, which need human intelligence, can be resolved by crowd workers over the Internet. Crowdsourcing also offers a new paradigm for speech quality assessment, promising higher ecological validity of the quality judgments at the expense of potentially lower reliability. This paper compares laboratory-based and crowdsourcing-based speech quality assessments in terms of comparability of results and efficiency. For this purpose, three pairs of listening-only tests have been carried out using three different crowdsourcing platforms and following the ITU-T Recommendation P.808. In each test, listeners judge the overall quality of the speech sample following the Absolute Category Rating procedure. We compare the results of the crowdsourcing approach with the results of standard laboratory tests performed according to the ITU-T Recommendation P.800. Results show that in most cases, both paradigms lead to comparable results. Notable differences are discussed with respect to their sources, and conclusions are drawn that establish practical guidelines for crowdsourcing-based speech quality assessment

    Full Reference Video Quality Assessment for Machine Learning-Based Video Codecs

    Full text link
    Machine learning-based video codecs have made significant progress in the past few years. A critical area in the development of ML-based video codecs is an accurate evaluation metric that does not require an expensive and slow subjective test. We show that existing evaluation metrics that were designed and trained on DSP-based video codecs are not highly correlated to subjective opinion when used with ML video codecs due to the video artifacts being quite different between ML and video codecs. We provide a new dataset of ML video codec videos that have been accurately labeled for quality. We also propose a new full reference video quality assessment (FRVQA) model that achieves a Pearson Correlation Coefficient (PCC) of 0.99 and a Spearman's Rank Correlation Coefficient (SRCC) of 0.99 at the model level. We make the dataset and FRVQA model open source to help accelerate research in ML video codecs, and so that others can further improve the FRVQA model
    corecore